
WHAT IS CLAIMED IS: 

1. A method of identifying one or more positions in a polymer family, the method 
comprising: 

(a) accessing data representing a multiple sequence alignment (MSA) of a 
plurality of polymer sequences; and 

(b) identifying one or more positions within the MSA that have statistically 
significant conservation energy values using the following equation: 



wherein: 

i is a position in the MSA; 

AG* tat is the conservation energy value for position i; 

P? is the probability of monomer x at position i; 

P^ SA is the probability of monomer x in the MSA; and 
kT* is an energy unit, where k is Boltzmann's constant 



2. The method of claim 1 , wherein the method is executed using a machine. 

3. A program storage device readable by the machine of claim 2 and encoding 
instructions executable by the machine for performing the operations recited in 



4. The method of claim 1, further comprising generating a graphical image of the 
conservation energy values. 

5. The method of claim 1, wherein the polymer sequences comprise protein 
sequences. 




the claim. 



6. 



The method of claim 1, wherein monomer x comprises amino acid x. 
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7. The method of claim 1, wherein the data accessed comprises data from the PDZ 
domain family. 

8. The method of claim 1 , wherein the data accessed comprises data from the p21 ras 
domain family. 



9. The method of claim 1, wherein the data accessed comprises data from the 
hemoglobin domain family. 

10. A method of identifying one or more positions in a polymer family, the method 
comprising: 

(a) accessing data representing a multiple sequence alignment (MSA) of a 
plurality of polymer sequences; 

(b) calculating a conservation energy value for each position in the MSA 
using the following equation: 



. p x 

V r MSA 



wherein: 

i is a position in the MSA; 

AG- tat is the conservation energy value for position i; 

P*is the probability of monomer x at position i; 

P^ SA is the probability of monomer x in the MSA; 
kT* is an energy unit, where k is Boltzmann's constant; and 
(c) identifying one or more positions within the MSA that have statistically 
significant conservation energy values. 

1 1 . The method of claim 1 0, wherein the method is executed using a machine. 
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# # 



12. A program storage device readable by the machine of claim 1 1 and encoding 
instructions executable by the machine for performing the operations recited in 
the claim. 



13. The method of claim 10, further comprising generating a graphical image of the 
conservation energy values. 

14. The method of claim 10, wherein the polymer sequences comprise protein 
sequences. 

15. The method of claim 10, wherein monomer x comprises amino acid x. 

16. The methockof claim 10, wherein the data accessed comprises data from the PDZ 
domain family. 

17. The method of claim 10, wherein the data accessed comprises data from the p21 ra 
domain family. 



18. The method of claim 10, wherein the data accessed comprises data from the 
hemoglobin domain family. 

19. A method useful in identifying interacting monomers in a polymer family, the 
method comprising: 

(a) accessing data representing a multiple sequence alignment (MSA) of a 
plurality of polymer sequences; 

(b) calculating a respective conservation energy value for each position in the 
MSA using the following equation: 



f px v 



P x , 

V MSA J 



wherein: 
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1 i is a position in the MSA; 

2 AG/*" is the conservation energy value for position i; 

3 P* is the probability of monomer x at position i; 

4 P^ SA is the probability of monomer x in the MSA; 

5 kT* is an energy unit, where k is Boltzmann's constant; 

6 (c) perturbing a position in the MSA other than position i; 

7 (d) re-calculating the respective conservation energy value for each position 

8 in the MSA to yield a perturbed conservation energy value; and 

9 (e) identifying positions within the MSA that have statistically significant 

10 differences between their respective conservation energy values and their 
i i perturbed conservation energy values. 

12 

13 20. The method of claim 19, wherein the perturbing includes: 

14 selecting a position j in the MSA; and 

is selecting a subset of the MSA, the subset having one or more monomers at 
i 6 position j in the MSA. 

17 

is 21 . The method of claim 20, wherein the re-calculating and identifying include: 

19 for each position in the MSA, calculating a vector difference AAG stat between the 

20 conservation energy value of the MSA and a conservation energy value of 

21 the subset of the MSA using the following equation: 

T 



23 wherein: 

24 AAG/'J ' is the vector difference in conservation energy values for 

25 position i; 

26 P^ is the probability of monomer x at position i of the subset; 

27 Pmsa\$ iS *e probability of monomer x in the subset; and 
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22 



aag/7 =*r\ 



1 identifying positions within the MSA that have statistically significant AAG stat 

2 values. 

3 

4 22. The method of claim 21, further comprising generating a graphical image of the 

5 AAG stat values. 

6 

7 23. The method of claim 19, wherein the method is executed using a machine. 

8 

9 24. A program storage device readable by the machine of claim 23 and encoding 

10 instructions executable by the machine for performing the operations recited in 
i i the claim. 

12 

13 25. The method of claim 19, wherein the polymer sequences comprise protein 

14 sequences. 

15 

16 26. The method of claim 19, wherein monomer x comprises amino acid x. 

17 

is 27. The method of claim 19, wherein the data accessed comprises data from the PDZ 

19 domain family. 

20 

21 28. The method of claim 19, wherein the data accessed comprises data from the p21 ras 

22 domain family. 

23 

24 29. The method of claim 19, wherein the data accessed comprises data from the 

25 hemoglobin domain family. 

26 

27 30. A machine-executed method of quantitatively identifying interacting amino acids 

28 in a protein family, the method comprising: 

29 (a) accessing data representing a multiple sequence alignment (MSA) of a 

30 plurality of protein sequences that are members of a common structural 

31 family; 
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(b) for each position in the MSA, calculating a respective conservation energy 
value using the following equation: 



V * V ^MSA ) 



4 wherein: 

5 i is a position in the MSA; 

6 AGJ tat is the conservation energy value for position i; 

7 P? is the probability of amino acid x at position i; 

8 P^ SA is the probability of amino acid x in the MSA; 

9 kT* is an energy unit, where k is Boltzmann's constant; 

10 (c) selecting a position j in the MSA; 

i i (d) selecting a subset of the MSA, wherein the subset has one or more amino 

12 acids at position j in the multiple sequence alignment; 

13 (e) for each position in the multiple sequence alignment, calculating a vector 

14 difference between the respective conservation energy value of the 

15 multiple sequence alignment and the respective conservation energy value 

16 of the subset of the multiple sequence alignment; and 

17 (f) identifying positions within the MSA that have statistically significant 
i 8 vector differences. 

19 

20 31. A method of analyzing data comprising: 

21 (a) providing at least one protein having a crystal structure and multiple 

22 positions; 

23 (b) solving the crystal structure of the at least one protein; and 

24 (c) identifying pathways between interacting positions on the at least one 

25 protein. 

26 
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A method of analyzing the effect of perturbation on a protein, comprising: 

(a) accessing data representing at least one protein and at least one perturbed 
protein, both proteins having at least one identical atom; 

(b) calculating a quantity of change A struct to the atom using the following 
equation: 



struct j ~ ~ 

wherein: 

\r mut | is the magnitude of a vector connecting the position of the 

atom in the at least one perturbed protein and the position 
of the atom in the at least one protein; 

<r mut is a standard deviation of the atom in the at least one 

perturbed protein; and 
a wt is a standard deviation of the atom in the at least one protein. 



A method of analyzing data, comprising: 

(a) accessing data representing at least one protein, a first perturbation of the 
at least one protein yielding a first perturbed protein, a second perturbation 
of the at least one protein yielding a second perturbed protein, and a 
double perturbation of the at least one protein yielding a double perturbed 
protein, the double perturbation comprising both the first and second 
perturbations, the proteins each having at least one identical atom; 

(b) calculating a quantity of structural coupling AA struct between the first and 
second perturbations using the following equation: 

IT — Y 

' mutt ' mut\\mut2 

""struct = ! 



wherein: 



2 2 2 2 

&wt &mut\ + <J mut2 + &mut\ t mut2 



A 
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1 r mun is a vector connecting the position of the atom in the first 

2 perturbed protein and the position of the atom in the at least 

3 one protein; 

4 r m ut\\mut2 * s a vector connecting the position of the atom in the 

5 double perturbed protein and the position of the atom in the 

6 second perturbed protein; 

7 a wt is a standard deviation of the atom in the at least one protein; 

8 <J mutX is a standard deviation of the atom in the first perturbed 

9 protein; 

10 a mut2 ls a standard deviation of the atom in the second perturbed 

1 1 protein; and 

12 a mut\ % muti * s a standard deviation of the atom in the double 
i 3 perturbed protein. 

14 

15 34. A method of analyzing microarray data comprising: 

16 (a) accessing microarray data representing an expression level of at least one 
n gene, an expression level of the at least one gene resulting from a first 

is perturbation, an expression level of the at least one gene resulting from a 

19 second perturbation, and an expression level of the at least one gene 

20 resulting from a double perturbation comprising both the first and second 

21 perturbations; and 

22 (b) calculating a degree of coupling AAE between the first and second 

23 perturbations using the following equation: 

( f \ 

24 AAE = kT In ^~ 

^fi J 

25 wherein: 

26 f x is the fold effect of the gene due to the first perturbation relative 

27 to the at least one gene; 
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f 2 is the fold effect of the gene due to the double perturbation 

relative to the second perturbation; and 
kT is an energy unit, where k is Boltzmann's constant. 
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